Geo-parsing Messages from Microtext
نویسندگان
چکیده
Widespread use of social media during crises has become commonplace, as shown by the volume of messages during the Haiti earthquake of 2010 and Japan tsunami of 2011. Location mentions are particularly important in disaster messages as they can show emergency responders where problems have occurred. This article explores the sorts of locations that occur in disaster-related social messages, how well off-theshelf software identifies those locations, and what is needed to improve automated location identification, called geo-parsing. To do this, we have sampled Twitter messages from the February 2011 earthquake in Christchurch, Canterbury, New Zealand. We annotated locations in messages manually to make a gold standard by which to measure locations identified by a Named Entity Recognition software. The Stanford NER software found some locations that were proper nouns, but did not identify locations that were not capitalized, local streets and buildings, or nonstandard place abbreviations and mis-spellings that are plentiful in microtext. We review how these problems might be solved in software research, and model a readable crisis map that shows crisis location clusters via enlarged place labels.
منابع مشابه
An algorithm for local geoparsing of microtext
The location of the author of a social media message is not invariably the same as the location that the author writes about in the message. In applications that mine these messages for information such as tracking news, political events or responding to disasters, it is the geographic content of the message rather than the location of the author that is important. To this end, we present a met...
متن کاملReports of the 2013 AAAI Spring Symposium Series
Much progress has been made in recent years in several areas within natural language processing. However, so far there has been less work related to microtext (for example, instant messaging, transcribed speech, and microblogs such as Twitter and Facebook). Microtext is made up of semistructured pieces of text that are distinguished by their brevity, informality, idiosyncratic lexicon, nonstand...
متن کاملNormalizing Microtext
The use of computer mediated communication has resulted in a new form of written text—Microtext—which is very different from well-written text. Tweets and SMS messages, which have limited length and may contain misspellings, slang, or abbreviations, are two typical examples of microtext. Microtext poses new challenges to standard natural language processing tools which are usually designed for ...
متن کاملLearning Ontologies from the Web for Microtext Processing
We build a mechanism to form an ontology of entities which improves a relevance of matching and searching microtext. Ontology construction starts from the seed entities and mines the web for new entities associated with them. To form these new entities, machine learning of syntactic parse trees (syntactic generalization) is applied to form commonalities between various search results for existi...
متن کاملClustering Microtext Streams for Event Identification
The popularity of microblogging systems has resulted in a new form of Web data – microtext – which is very different from conventional well-written text. Microtext often has the characteristics of informality, brevity, and varied grammar, which poses new challenges in applying traditional clustering algorithms to analyze microtext. In this paper, we propose a novel two-phase approach for cluste...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Trans. GIS
دوره 15 شماره
صفحات -
تاریخ انتشار 2011